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The  design  goals  of  performance, 
bandwidth  efficiency,  scalability,  and  robustness 


A  wireless  sensor  network  (WSN)  consists  of  a  large  number  of  spatially  distrib¬ 
uted  signal  processing  devices  (nodes),  each  with  finite  battery  lifetime  and  thus 
limited  computing  and  communication  capabilities.  When  properly  pro¬ 
grammed  and  networked,  nodes  in  a  WSN  can  cooperate  to  perform  advanced 
signal  processing  tasks  with  unprecedented  robustness  and  versatility,  thus  mak¬ 
ing  WSN  an  attractive  low-cost  technology  for  a  wide  range  of  remote  sensing  and  environmen¬ 
tal  monitoring  applications  [1],  [32]. 
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Prolonging  the  lifetime  of  a  WSN  is  important  for  both  com¬ 
mercial  and  tactical  applications.  With  nonrechargeable  batter¬ 
ies,  this  requirement  places  stringent  energy  constraints  on  the 
design  of  all  WSN  operations.  Energy  limitation  is  one  of  the 
major  differences  between  a  WSN  and  other  wireless  networks 
such  as  wireless  local  area  networks,  where  energy  efficiency  is 
of  a  lesser  concern.  Also,  WSNs  are  often  self-configured  net¬ 
works  with  little  or  no  pre-established  infrastructure  as  well  as  a 
topology  that  can  change  dynamically.  Moreover,  there  may  be 
physical  obstacles  in  the  network  environment  that  can  degrade 
considerably  the  wireless  links  among  sensors.  All  these  present 
formidable  challenges  to  the  design  of  communication,  net¬ 
working,  and  local  signal  processing  algorithms  performed  by  a 
WSN.  In  this  article,  we  focus  on  distributed  estimation  tasks 
performed  by  a  WSN  under  energy  and  bandwidth  constraints. 

Since  data  are  collected  by  sensors  at  geographically  dis¬ 
tinct  locations,  estimation  using  a  WSN  requires  not  only  local 
information  processing  but  also  intersensor  communications. 
The  latter  brings  in  a  wireless  communication  and  networking 
aspect  of  the  problem  that  is  absent  from  the  traditional  cen¬ 
tralized  estimation  framework.  In  fact,  a  major  challenge  in 
WSN  research  is  the  integrated  design  of  local  signal  process¬ 
ing  operations  and  strategies  for  intersensor  communication 
and  networking  so  as  to  strike  a  desirable  tradeoff  among  ener¬ 
gy  efficiency,  simplicity,  and  overall  system  performance.  For 
instance,  to  maximize  battery  lifetime  and  reduce  communica¬ 
tion  bandwidth,  it  is  essential  for  each  sensor  to  locally  com¬ 
press  its  observed  data  so  that  only  low  rate  intersensor 
communication  is  required.  This  motivates  joint  design  of  the 
compression-estimation  module  per  sensor. 

Designing  distributed  compression-estimation  algorithms  in 
the  context  of  a  WSN  differs  from  the  traditional  centralized 
framework  in  several  important  aspects. 

■  Constraints  on  sensor  cost,  bandwidth,  and  energy  budget 
dictate  that  low  quality  sensor  observations  may  have  to  be 
aggressively  quantized,  e.g.,  down  to  a  few  bits  per  sample  per 
node.  Thus,  estimators  must  be  developed  based  on  severely 
quantized  versions  of  very  noisy  observations. 


■  Obtaining  the  complete  signal  models  for  a  large  number 
of  sensors  may  be  impractical,  particularly  in  dynamic 
sensing  environments.  This  preempts  application  of  optimum 
estimation  algorithms  and  motivates  distributed  estimators 
based  on  partially  known  or  unknown  data/noise  models. 

■  Sensors  may  enter  or  leave  the  network  dynamically, 
resulting  in  unpredictable  changes  in  network  size  and 
topology.  Thus,  to  ensure  robust  operation,  compression- 
estimation  algorithms  for  WSNs  have  to  work  with  limited 
(or  no)  knowledge  of  the  network  topology  and/or  size. 

■  Local  compression  at  a  sensor  node  depends  not  only  on 
the  quality  of  sensor  observation,  but  also  on  the  quality  of 
the  wireless  communication  channel (s)  from  the  node. 

In  addition,  the  design  of  distributed  algorithms  should  be 
coupled  with  the  underlying  WSN  topology.  We  consider  two 
popular  WSN  deployments  characterized  by  the  presence  or 
absence  of  a  fusion  center  (FC). 

■  When  an  FC  is  present,  there  is  no  intersensor  communi¬ 
cation;  communication  is  only  between  sensors  and  the  FC. 
The  FC  collects  locally  processed  data  and  produces  a  final 
estimate;  see  Figure  1. 

■  In  ad  hoc  WSNs,  there  is  no  FC.  The  network  itself  is 
responsible  for  processing  the  collected  information,  and  to 
this  end,  sensors  communicate  with  each  other  through  the 
shared  wireless  medium;  see  Figure  2. 

Hybrids  are  also  possible  in  which  the  WSN  is  partitioned  into 
clusters  possibly  with  a  hierarchical  structure.  Each  cluster  has 
a  local  FC  generating  intermediate  estimates,  which  in  turn  are 
combined  to  obtain  a  final  estimate. 

The  focus  of  this  article  is  on  distributed  compression  and 
estimation  using  WSNs  in  which  the  main  design  goals  are  per¬ 
formance,  bandwidth  efficiency,  scalability,  and  robustness  to 
changes  in  the  network  or  environment.  (The  distributed  detec¬ 
tion  in  WSNs  is  discussed  in  [10].)  We  first  pursue  determinis¬ 
tic  parameter  estimators  and  study  the  intertwining  tasks  of 
quantization  and  estimation  in:  i)  low  signal-to-noise  ratio 
(SNR)  situations  where  the  noise  standard  deviation  is  in  the 
order  of  the  parameter’s  dynamic  range  and  ii)  universal 


[FIG1]  A  WSN  topology  with  an  FC. 
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[FIG2]  Ad  hoc  WSN. 


estimation  where  the  sensor  data  and 
noise  model  are  unknown.  The  ulti¬ 
mate  objective  is  to  understand  how 
the  signal  processing  capability  of  a 
WSN  scales  up  with  its  size  and  to 
develop  robust  distributed  signal  pro¬ 
cessing  algorithms  and  protocols 
with  low  bandwidth  requirements 
and  optimal  performance.  We  will  see 
that  in  the  low  SNR  regime,  univer¬ 
sal  distributed  estimators  not  only 
exist  but  also  achieve  performance 
close  to  that  of  estimators  based  on 
the  original  (nonquantized)  observa¬ 
tions.  Moreover,  since  network 
resources  (e.g.,  power  and  band¬ 
width)  are  scarce,  their  optimal  allo¬ 
cation  and  scheduling  can  lead  to 
significant  savings. 

The  techniques  and  basic  results 
that  are  derived  for  the  parameter 
estimation  paradigms  outlined  first 
are  later  extended  to  more  general 
and  practical  signal  models.  A 
Bayesian  estimation  framework  is  laid  out  along  with  an  appli¬ 
cation  to  state  estimation  of  dynamical  stochastic  processes. 
The  final  part  of  the  article  addresses  several  issues  pertaining 
to  WSNs  with  an  FC  from  an  information  theoretic  point  of 
view.  These  properties  not  only  offer  performance  benchmarks 
for  distributed  signal  processing  but  also  provide  general 
guidelines  for  algorithmic  designs. 

DISTRIBUTED  ESTIMATION  FRAMEWORK 

Let  us  consider  a  generic  distributed  estimation  problem  using 
a  WSN  with  an  FC.  Our  goal  is  to  estimate  a  p  x  1  vector  param¬ 
eter  0  gMp  from  K  independent  scalar  observations  collected 
by  as  many  distributed  sensors,  as  depicted  in  Figure  3.  The 
observations  obey  the  model 

xk  =  M0)  +  wk,  (1) 

where  0^  :  M*7  ->  M  is  generally  a  nonlinear  function  and  the 
noise  terms  w^,k=  1, . . .  ,  K  are  zero-mean  independent  ran¬ 
dom  variables  with  variance  or|  :=  E(w2k).  Let  Pk(w)  be  the  prob¬ 
ability  density  function  (pdf)  of  and  F^w)  :=  p^{u)du 
denote  the  corresponding  complementary  cumulative  distribu¬ 
tion  function  (ccdf).  Although  our  focus  is  on  the  parameter  esti¬ 
mation  problem  in  (1),  the  methods  here  can  be  extended  to 
nonparametric  models  as  well.  Interested  readers  are  also 
referred  to  [35],  which  discusses  robust  nonparametric  methods 
using  distributed  learning. 

Distributed  estimation  using  a  WSN  entails  a  local  compres¬ 
sion  stage  in  which  sensors  perform  local  quantization  of  their 
observations  to  obtain  finite-rate  messages  m^x^).  These  mes¬ 


[FIG3]  Distributed  estimation  setup. 


sages  are  then  sent  to  the  FC  where  a  final  estimate 
0  =  r(m\, . . .  ,  itik)  is  generated. 

If  infinite  bandwidth  were  available,  each  sensor  could  send 
its  analog-amplitude  observation  x^  to  the  FC  corresponding  to 
the  setup  discussed  in  the  previous  paragraph  with 
nik(Xk)  =  x^.  Upon  receiving  these  real-valued  messages,  the 
FC  can  use  any  of  a  number  of  estimation  techniques,  depend¬ 
ing  on  the  extent  of  its  prior  knowledge  about  the  pdf  Pk(w),  to 
generate  an  optimal  (in  some  statistical  sense)  estimate 
0O  =  Fo (Jf i , . . .  ,  xk)-  If,  for  example,  0  is  scalar  (denoted  by  0) 
and  we  consider  the  simple  signal  model  x^  =  0  +  a  popular 
approach  is  to  compute  the  best  linear  unbiased  estimator 
(BLUE)  §o  =  4lue  :=  (£*Li  -ty/of  )/(£*Li  1/°r*>  whose 
mean-square  error  (MSE)  [24] 
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DISTRIBUTED  ESTIMATORS 

In  this  section,  we  will  present  distributed  estimators  for: 

i)  known  univariate  noise  pdfs,  ii)  known  noise  pdfs  with  a 
finite  number  of  unknown  parameters,  iii)  completely 
unknown  noise  pdfs,  and  iv)  generalizations  to  multivariate 
and  possibly  correlated  pdfs.  Even  though  the  estimators  will 
turn  out  to  require  minimal  communication  overhead  from 
the  sensors  to  the  FC,  they  will  exhibit  essentially  identical 
MSE  performance  and  comparable  complexity  with  the  corre¬ 
sponding  clairvoyant  estimators. 

COMPLETELY  KNOWN  PDF 

Let  us  start  by  considering  the  signal  model 


xk  =  0  +  wk 


(3) 


[FIG4]  Performance  penalty  of  a  1-b  estimator  with  respect  to  the 
sample  mean  estimator.  When  the  parameter's  dynamic  range  is 
on  the  order  of  the  observation  noise  variance,  the  ratio 
var(0)/var(0o)  is  not  large. 

is  minimum  among  all  linear  unbiased  estimators.  If  further¬ 
more,  the  noise  at  sensor  k  adheres  to  a  Gaussian  pdf  A/"(0,  or|), 
then  0blue  is  the  minimum  variance  unbiased  estimator 
(MVUE)  that  minimizes  the  estimator  variance  var  (0)  for  all 
values  of  0.  In  the  particular  case  cr|  =  a2  for  all 
k  =  1,  2, . . . ,  K,  we  obtain  the  sample  mean  estimator 
x  :=  {l/K)  J2k=  l  xk  whose  MSE  is  known  to  be  a2  /K. 

Having  each  sensor  send  the  analog-amplitude  xk  to  the  FC 
may  violate  the  severe  bandwidth  and  power  constraints  that 
sensors  are  envisioned  to  obey.  In  such  cases,  it  may  be  prefer¬ 
able  to  let  each  sensor  transmit  a  quantized  version  of  xk  to  the 
FC  in  the  form  a  finite  rate  message  mk(xk),  to  enable  forming 
at  the  FC  the  estimator  0  =  T(m\, . . . ,  m k)  based  on  the 
received  messages  m\,  . . .  ,  mx  (which  are  versions  of 
,  m(xk)  corrupted  by  the  noisy  channel).  Naturally, 
the  MSE  performance  of  0  =  T{m\, . . . ,  mx)  is  in  general  infe¬ 
rior  to  that  of  0o  =  To(xi, . . . ,  xk)  due  to  quantization-  and 
channel-induced  errors.  Even  though  the  optimal  centralized 
estimator  9q  =  ro(xi, . . . ,  xk)  based  on  analog-amplitude 
observations  may  be  impractical  in  a  WSN  context,  it  serves  as  a 
useful  clairvoyant  benchmark  to  evaluate  the  performance  of 
distributed  estimators  0. 

Our  goal  in  the  remaining  sections  is  to 

i)  derive  efficient  local  quantization  schemes  mk{xk)  and  dis¬ 
tributed  estimators  0  =  T(m\, . . . ,  mx)  under  energy  and 
bandwidth  constraints 

ii)  benchmark  their  MSE  performance  and  quantify  the  per¬ 
formance  loss  when  compared  to  the  centralized  clairvoyant 
estimators  0O 

iii)  ensure  low-complexity  {F,  mk  :  k  =  1, . . . ,  K]  alternatives 

iv)  design  adaptive  resource  (e.g.,  power  and  bandwidth)  allo¬ 
cation  and  scheduling  strategies  to  improve  the  overall  net¬ 
work  performance. 


when  the  noise  pdf  pk{w)  =  p(w)  for  all  k  and  p(w)  is  known. 
Albeit  simple,  this  model  will  illustrate  basic  properties  that 
carry  over  to  more  pragmatic  models  we  will  consider  later.  For 
simplicity,  we  will  impose  a  rate  constraint  of  one  binary  bit  per 
sensor  sample,  but  our  results  can  be  easily  extended  to  any 
fixed  number  of  bits  per  sensor  sample.  For  binary  messages 
(i.e.,  Lk  =  1),  we  can  consider  the  halfline  Bc  :=  (rc,  oo)  e  M 
and  define  the  message  functions  as  mk(xk)  =  \{xke  (rc,  oo)} 
indicating  whether  the  observation  xk  belongs  to  Bc  or  not. 
Moreover,  we  make  the  simplifying  assumption  that  the  chan¬ 
nels  from  the  sensors  to  the  FC  are  ideal,  so  that 
mk(xk)  =  mk{xk),  for  all  k. 

Given  that  the  noise  is  i.i.d.  and  the  noise  ccdf  Fw(w )  is 
known,  it  is  easy  to  find  the  maximum  likelihood  estimator 
(MLE)  0mle  =  Fmle(^i>  .  •  • ,  mk)  [6].  Indeed,  since  mk  is  an 
indicator  variable,  it  is  Bernoulli  distributed  with  parameter 
given  by  the  probability  q  :=  ?r{xk  e  Bc }  =  Fw{ rc  -  0).  As  q 
and  0  are  related  by  a  one-to-one  function  and  the  MLE  of  q  is 
q  =  (l/K)  Ylk=i  mk’ we  deduce  from  the  invariance  property  of 
MLEs  the  closed-form  expression  [34],  [38] 

<?MLE  =  rc  -  F~l  ^  ^  m^j  .  (4) 

Although  Tnk  is  a  discontinuous  function  of  xk,  0mle  is  an  estima¬ 
tor  whose  computational  cost  is  in  the  order  of  the  optimal  clair¬ 
voyant  estimators  such  as  the  sample  mean  estimator  x  in  (2). 

The  Cramer-Rao  lower  bound  (CRLB)  for  estimating  0  based 
on  {mk}k=l  provides  a  performance  limit  for  the  variance  of 
any  estimator  0  =  T{m\, . . . ,  mx)  and  it  is  achieved  by  0mle 
for  K  sufficiently  large.  For  our  problem,  the  CRLB  is  given  by 
[38],  B(9)  :=Fw(rc-m-Fw(Tc-0)]/r?(rc-e),  from 
which  we  infer  that  the  ultimate  performance  limit  is  deter¬ 
mined  by  the  distance  between  rc  and  0.  For  the  particular  case 
of  Gaussian  noise,  we  can  define  Ac  :=  (rc  -  9) /a  as  the  (cr)- 
distance  between  the  parameter  0  and  the  threshold  rc  meas¬ 
ured  in  standard  deviation  units  and  let  Q(v)  denote  the  stan¬ 
dardized  Gaussian  ccdf.  Since  the  noise  is  Gaussian,  the  sample 
mean  estimator  x  is  the  MVUE  with  variance  var  (0q)  =  <j2/K. 
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Compared  with  this  benchmark  estimator,  the  one  in  (4) 
incurs  loss  measured  by  the  ratio  £>(0)/var($o)  = 
(27t)Q(Ac)[1  -  Q( Ac)]/£_a?  <  (7t/2)^a?/2,  which  we  depict 
in  Figure  4  versus  Ac  [38],  [39]. 

Figure  4  reveals  something  unexpected:  relying  on  a  single  bit 
per  xk,  the  estimator  in  (4)  may  exhibit  only  n/2  times 
higher  variance  compared  to  the  clairvoyant  §o  =  x  that  relies  on 
the  nonquantized  data  xk.  But  this  minimal  loss  in  performance 
corresponds  to  the  ideal  choice  Ac  =  0,  which  implies  rc  =  6  and 
requires  perfect  knowledge  of  the  unknown  0  for  selecting  the 
quantization  threshold  rc.  How  do  we  select  rc  and  how  much  do 
we  lose  when  the  unknown  6  lies  anywhere  in  (— oo,  oo),  or  when 
0  lies  in  [©i,  ©2],  with  ©1,  ©2  finite  and  known  a  priori. 
Intuition  suggests  selecting  the  threshold  as  close  as  possible  to 
the  parameter.  This  can  be  realized  with  an  iterative  estimator 
which  can  be  formed  as  in  (4),  using  x®  =  0(z_1),  the  parameter 
estimate  from  the  previous  (/  —  l)st  iteration.  As  we  will  see  later, 
this  iterative  threshold  placement  matches  nicely  with  state  esti¬ 
mation  of  dynamical  processes  based  on  binary  observations. 

But  in  the  batch  formulation  considered  herein,  selecting  zc 
is  challenging;  a  closer  look  at  8(6)  confirms  that  the  loss  can 
be  huge  if  rc  —  0  ^>0.  The  implication  of  the  latter  is  twofold: 

i)  since  the  loss  shows  up  in  the  CRLB,  the  potentially  high  vari¬ 
ance  of  estimators  based  on  quantized  observations  is  inherent 
to  the  possibly  severe  bandwidth  limitations  of  the  problem 
itself  and  is  not  unique  to  a  particular  estimator; 

ii)  how  successful  the  rc  selection  is  depends  on  the  dynamic 
range  |  ©1  —  ©2 1  that  makes  sense  because  the  latter  affects  the 
error  due  to  the  quantization  of  x k  to  m^.  In  fact,  two  sources  of 
error  are  present  in  joint  quantization-estimation  problems: 
quantization  and  noise. 

To  account  for  both,  the  proper  figure  of  merit  for  estimators 
based  on  binary  observations  is  the  quantization  SNR  (Q-SNR) 
that  we  define  as  [39] 


Y  '= 


|©1-©2|2 


(5) 


Notice  that  contrary  to  common  wisdom,  the  smaller  Q-SNR  is, 
the  easier  it  becomes  to  select  rc  judiciously.  Furthermore,  the 
variance  increase  in  8(6)  relative  to  the  variance  of  the  clairvoy¬ 
ant  §0  is  smaller,  for  a  given  g .  This  is  because  as  the 
Q-SNR  increases  the  problem  becomes  more  difficult  in  general, 
but  the  rate  at  which  the  estimation  variance  increases  is 
smaller  for  the  CRLB  in  Figure  4  than  for  var  (0q)  =  cr2  /K. 


If  we  define  a  single  quantization  region  as  in  “Distributed 
Estimators,”  different  combinations  (9,  cr)  lead  to  sets  of  mes¬ 
sages  {mk)l :=1  with  identical  probabilities.  To  avoid  this  ambigu¬ 
ity  problem,  we  define  two  regions  Bj  :=  (y,  00),  j—  1,2  with 
z\  <  T2  and  let  half  the  sensors  use  B\  to  construct  their  binary 
observations  and  the  remaining  half  use  B2.  Accordingly,  the 
messages  are  defined  as  mk  :=  l{xk  g  (ri,  00)}  for 
k  e  [1,  K/2]  and  mk  :=  \{xk  e  (t2,  00)}  for  k  e  [K/2  +  1,  K\. 

As  well  as  in  the  previous  subsection,  the  m k  messages  are 
Bernoulli  with  parameters  qj  :=  Pr {xk  g  Bj)  =  Fv[(xj  -  0)/o] 
depending  on  whether  Sk  uses  threshold  x\  or  12  to  construct 
mk.  These  expressions  for  the  Bernoulli  parameters  imply  that 
(6,  o)  and  (q\,  <72)  are  related  by  the  nonlinear  2x2  mapping 
[Ql,Q2\T  =  Ev[(x\  -0)/cr],Fv[(T2  -6)/g]t  that  can  be 
inverted  to  express  (0,  cr)  in  terms  of  (q\,  <72).  This,  plus  the 
invariance  property  of  MLEs  leads  to  [39] 


6mle  = 


Fvl(q2)n 

Fyl{Q2) 


Fvl(qi)x2 

Fvl{Q\) 


(6) 


where  the  MLEs  of  q\,  q2  can  be  found  as  q\  —  (2/K)  J2k=\  mk 
and  <72  =  (2/K)  J2k=K/2+ 1  mk-  Likewise,  we  can  obtain  the 
MLE  of  a  if  we  are  interested  in  the  noise  power. 

Upon  defining  the  <r -distances  A j=  (xj  —  6)/o  and  the 
ratios  8j(6)  :=  Fv(Aj)[l  -  Fv(Aj)\/p2(Aj),  the  CRLB  can  be 
written  as  B(6)/(g2 /K)  =  2/(A2  -  A^l^/A^B^O)  + 
(A\/A2)B2(0)].  Interestingly,  B(0)  is  a  linear  combination 
of  B\(9),  82(6)  that  are  identical  to  the  ratio  depicted  in 
Figure  4.  This  establishes  that  the  variance  penalty  with 
respect  to  the  clairvoyant  sample  mean  estimator  is  still  a 
relatively  small  factor  when  the  Q-SNR  takes  small-to- 
medium  values  [39]. 

The  approach  here  can  be  generalized  to  noise  pdfs  that 
depend  on  L  parameters  by  defining  L+ 1  regions 
B[  :=  (r/,  00)  and  dividing  the  sensors  in  L  +  1  groups  so  that 
the  /  th  group  constructs  their  binary  observations  as 
tnk  =  1  {xk  g  £/}.  (See  Figure  5.)  This  applies  when  the  noise 
adheres  to,  e.g.,  a  Gaussian  mixture  pdf.  In  all  these  cases  we 
find  that  a  low  complexity  MLE  can  be  constructed  in  closed 
form  by  invoking  the  invariance  property  of  MLEs.  The  associat¬ 
ed  normalized  penalty  var(0MLE)/var(0o)  is  small  when  y  is. 
Even  when  the  noise  pdf  is  completely  unknown  we  can  develop 
nonparametric  universal  estimators  sharing  the  latter  property 
as  we  show  later. 


KNOWN  NOISE  PDF  WITH  UNKNOWN  PARAMETERS 

The  estimator  in  (4)  requires  perfect  knowledge  of  the  noise  pdf 
Pw(w),  which  may  not  always  be  available.  Here  we  suppose  that 
pw(w)  =  pw(w\  VO  is  known  and  depends  on  the  parameter  vec¬ 
tor  \fr  g  MLxl,  which  is  unknown.  Consider,  for  example,  the 
case  frequently  encountered  in  practice  in  which  the  noise  pdf  is 
known  (say  Gaussian)  except  for  its  variance  E(v?^)  —  a2.  Note 
that  the  problem  of  estimating  0  when  the  noise  pdf  is  pw(w\  g) 
can  be  addressed  by  writing  x^  —  0  +  gvj^  with  E(v2k)  =  1  and 
estimating  6  while  viewing  g  as  a  nuisance  parameter  [39]. 


VECTOR  PARAMETERS  IN  COLORED  GAUSSIAN  NOISE 

Results  presented  earlier  can  be  extended  to  the  vector  signal 
model  (1).  For  illustrative  purposes,  let  us  assume  that  the  noise 
pdf  Pk(w)  and  the  corresponding  ccdf  F^(w)  are  known  but  may 
change  from  sensor  to  sensor  and  recall  that  we  denote  the 
noise  power  as  E(u?£)  =  <r|. 

As  before,  we  define  1-b  messages  :=  \{Xk  g  (t*,  00)}, 
and  note  that  m ^  is  Bernoulli  distributed  with  parameter 
qk  :=  ?r{xk  g  (rk,  00)}  =  Fk[xk  -  </>k(G)] .  Defining  the  log- 
likelihood  function 
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K- 1 


L{0)  :=  ^  >72*  In  qk  +  (1  -  /w*)  ln(l  -  ?*),  (7) 

k= 0 

we  can  find  the  MLE  of  0  based  on  observations  [mk}k~^  as 
0MLE  :=  arg  max^{L(0)}  [39]. 

The  search  for  0  can  be  challenging  due  to  the  multimodal 
nature  of  L{0)  as  well  as  the  numerical  difficulties  caused  by  qk 
values  being  close  to  zero.  However,  if  al)  the  noise  pdfs  pk(w) 
are  log-concave,  and  a2)  the  functions  0^(0)  are  linear,  then  the 
likelihood  L(0)  becomes  a  concave  function  of  0.  The  concavity 
of  L(0)  implies  that  computationally  efficient  search  algorithms 
e.g.,  interior  point  methods,  are  guaranteed  to  converge  to  the 
global  maximum  0mle-  Note  that  al)  is  satisfied  by  common 
noise  pdfs,  including  the  multivariate  Gaussian,  uniform  in  a 
convex  set,  as  well  as  generalized  Gaussian  [7,  p.  104];  while  a2) 
is  typical  in  parameter  estimation.  Moreover,  even  when  a2)  is 
not  satisfied,  linearizing  0^(0)  using  Taylor’s  expansion  is  a 
common  first  step,  typical  in,  e.g.,  parameter  tracking  applications. 

To  quantify  the  performance  penalty  in  the  vector  case,  we 
define  the  equivalent  noise  powers  p2k  :=  Fk(xk  —  4>k(0)) 
[1  -  Fk( xk  —  cj)k(0))\/p?[Tk  -  0^(0)],  and  consider  two  signal 
models  according  to  (1)  with  the  noise  powers  given  by 


:=  K2’ 


&k_i]t  and  p  :=  [po, ,  pk-\]t ,  respective¬ 
ly.  It  can  be  shown  that  the  CRLB  Bx(0;  p)  when  estimating  0 
based  on  [xk}k=l  with  noise  powers  given  by  p,  coincides  with 
the  CRLB  Bm(0 ;  cr)  associated  with  the  estimation  of  0  based 
on  when  the  noise  powers  are  the  components  of  o 

[38],  [39].  Equivalently,  it  follows  that  performance  of  a  cen¬ 
tralized  estimator  when  the  noises  have  variance  pk  coincides 
with  the  performance  of  a  single-bit  distributed  estimator 
when  the  noise  variances  are  p|  with  the  ratio  p|/<r|  charac¬ 
terized  as  in  Figure  4. 

Even  though  we  considered  scalar  observations  so  far,  the 
results  generalize  to  vector  observations  xk  =  4>k(0)  +  w k  as 
long  as  the  components  of  wk  are  independent  (e.g.,  white 


FJH-0)  si 


FJPl-6) 


dk/l 


[FIG5]  When  the  noise  pdf  pw(w\  0)  is  known  but  depends  on  L  unknown 
parameters  we  divide  the  sensors  in  L  groups  each  using  a  different  threshold  x\  to 
construct  the  binary  message  mk.  Note  that  the  messages  are  Bernoulli  distributed 
with  parameters  qi  :=  Pr{x^  e  (r /,  oo)}  =  Fw(ti  -  6)  when  sensor  Sk  uses  the 
threshold  r/. 


Gaussian  noise).  If  w*  is  Gaussian  but  colored,  the  approach 
described  here  can  also  be  used  after  local  prewhitening  [39]. 

UNIVERSAL  APPROACHES 

As  shown  earlier,  optimal  distributed  estimators  depend  on 
the  parametric  model  and  the  noise  pdfs.  In  certain  cases 
though,  characterizing  the  exact  sensor  observation  distribu¬ 
tions  for  a  large  number  of  sensors  may  be  impossible,  espe¬ 
cially  in  a  dynamic  sensing  environment.  Such  applications 
motivate  universal  distributed  estimators  that  are  independ¬ 
ent  of  the  noise  or  parameter  distributions,  under  either 
bandwidth  or  energy  constraints. 

ESTIMATION  IN  A  HOMOGENEOUS  ENVIRONMENT 

Again,  let  us  consider  a  WSN  with  an  FC  and  the  signal  model 
(3)  where  wk  are  spatially  uncorrelated  with  zero  mean  but 
otherwise  unknown.  For  the  moment,  let  us  also  assume  that  all 
wireless  channels  are  orthogonal  and  distortionless.  As  dis¬ 
cussed  earlier,  if  the  sensors  could  communicate  their  real-val¬ 
ued  observations  to  the  FC  error  free,  then  the  sample  mean 
estimator  achieves  an  MSE  performance  of  o2  /K,  implying  that 
the  WSN  has  an  estimation  capability  that  scales  linearly  with 
network  size  K.  We  have  seen  that  under  a  rate  constraint  of  one 
binary  bit  per  sensor  sample,  the  same  0(1 /K)  scaling  law 
remains  valid  when  the  noise  pdf  is  completely  or  partially 
known.  Surprisingly,  this  scaling  law  can  even  be  achieved  by 
universal  distributed  estimators,  as  we  explain  below. 

The  idea  is  to  represent  sensor  observations  in  binary  form  and 
quantize  them  to  different  bit  positions  across  sensors.  Specifically, 
we  can  have  1  /2  of  the  sensors  quantize  their  observations  to  the 
first  most  significant  bit  (MSB),  1/4  of  the  sensors  quantize  their 
observations  to  the  second  MSB,  1  /8  of  the  sensors  quantize  their 
observations  to  the  third  MSB,  and  so  on  [27].  The  resulting  bits 
are  then  used  as  the  1-b  messages  for  individual  sensors.  The  FC 
simply  averages  the  received  1-b  messages  to  generate  an  esti¬ 
mate  of  0.  Clearly,  this  distributed  estimator  is  universal  as  it  is 
completely  independent  of  the  noise  pdf. 
Assume  all  observations  xk  are  bounded 
in  an  interval  [~W,W]  and  they  are  con¬ 
ditionally  independent  given  0.  Then,  the 
mean  of  these  message  functions 

(tt?i  +  tt?2  H - VmjdlK  estimates  0 

with  an  MSE  upper  bounded  by  W2/K. 
Notice  that  this  estimation  scheme  assigns 
more  sensors  to  estimate  the  first  MSB  of  0 
than  any  other  bit.  This  is  intuitively  rea¬ 
sonable  since  getting  the  first  MSB  of  0 
right  has  the  highest  impact  on  minimiz¬ 
ing  the  final  MSE. 

One  limitation  of  the  aforementioned 
strategy  is  that  it  requires  the  use  of  an  FC 
and  knowledge  of  network  size  K  to  specify 
which  sensor  should  quantize  its  observa¬ 
tion  to  which  bit.  Moreover,  the  resultant 
estimator  is  nonisotropic  in  the  sense  that 
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sensors  quantize  their  observations  to  possibly  different  MSBs.  This 
is  difficult  to  implement  in  an  ad  hoc  sensor  network  where  there  is 
little  or  no  coordination  among  sensors.  For  such  networks  we  can 
use  the  following  probabilistic  estimation  scheme  [28]: 

■  With  each  new  sample  xk,  sensor  k  flips  a  coin  and,  with 
probability  1  /2,  quantizes  xk  to  the  first  MSB,  with  probability 
1  /4  quantizes  x k  to  the  second  MSB,  and  so  on.  The  quanti¬ 
zation  outcome  is  sent  to  all  its  neighbors. 

■  Messages  are  communicated  among  sensors  via  an 
underlying  WSN  protocol.  Each  sensor  recursively  com¬ 
putes  the  average  of  all  received  binary  messages  that  are 
distinct  (determined  by,  say,  the  sender’s  ID),  and  uses  it  as 
an  estimator  of  0. 

Intuitively,  with  the  aforementioned  coin  flipping  per  sen¬ 
sor,  there  will  be  roughly  1/2  of  the  sensors  in  the  network 
quantizing  their  observations  to  the  first  MSB,  about  1/4  of 
the  sensors  in  the  network  quantizing  their  observations  to 
the  second  MSB,  and  so  on.  Thus,  this  probabilistic  estimation 
scheme  should  closely  approximate  the  MSE  performance  of 
the  previous  nonprobabilistic  one.  This  is  indeed  the  case. 
Assume  that  each  message  has  a  header  containing  the 
sender’s  ID  and  eventually  arrives  at  its  destination  without 
error.  Then  each  node  in  the  WSN  produces  an  unbiased  esti¬ 
mate  of  0  with  an  MSE  of  at  most  AW2 /(K\  +  1),  where  K\ 
denotes  the  number  of  distinct  messages  received  by  this  sen¬ 
sor.  Notice  that  this  probabilistic  estimator  is  isotropic  and 
robust  in  the  sense  that  all  sensors  operate  identically  and 
independently  and  remain  oblivious  to  possible  changes  in  the 
network  size  or  the  noise  pdf.  This  probabilistic  distributed 
estimator  can  also  be  adapted  for  distributed  detection  [48]. 

The  performance  of  a  universal  estimator  is  characterized  by 
the  worst  case  MSE  over  all  possible  distributions  of  the  observa¬ 
tions  xk  with  support  [-W,  W].  Given  the  binary  nature  of  mes¬ 
sages,  the  message  functions  must  take  the  form 
mk(xk)  —  1  [xk  g  Sk},  indicating  whether  the  observation  xk 
belongs  to  Sk  or  not,  where  Sk  is  a  subset  of  M.  The  design  of  an  opti¬ 
mal  1-bit  universal  estimator  is  then  to  choose  {S\, . . .  ,  Sk’,  H  such 
that  mdLXpk(W^eE(\6  —  0 12)  is  minimized.  An  example  in  [53] 
shows  that  max^^ E(\0  —  0\ 2)  >  W2/(AK).  Thus  the  best 
achievable  MSE  for  single-bit  universal  estimators  is  W2/(AK), 
which  implies  that  performance  of  the  universal  estimators  in  [27] 
and  [28]  is  within  a  constant  factor  of  4  to  being  optimal. 

Beyond  the  simple  1-b  per  sensor  observation,  universal  esti¬ 
mators  can  be  derived  for  any  fixed  rate,  and  channel  distortions 
can  also  be  accounted  for  [27]. 


In  such  inhomogeneous  environments,  it  is  no  longer  rea¬ 
sonable  to  insist  on  having  each  sensor  transmit  identical  num¬ 
ber  of  bits  to  the  FC.  Intuitively,  sensors  with  higher  local  SNRs 
should  send  more  bits  to  the  FC  and  weigh  these  bits  more  than 
those  from  sensors  with  lower  SNRs.  To  this  end,  we  can  let 
each  sensor  compress  its  observation  to  a  discrete  message  with 
length  proportional  to  the  logarithm  of  its  local  SNR  and  then 
transmit  the  resulting  message  to  the  FC.  The  final  estimate  of 
the  unknown  parameter  is  computed  at  the  FC  by  combining 
the  received  bits  according  to  a  universal  fusion  rule.  The  fol¬ 
lowing  distributed  estimator  in  an  inhomogeneous  sensing 
environment  is  proposed  in  [51]. 

■  At  sensor  k,  choose 


Lk  = 


°k 


(8) 


and  take  mk  to  be  the  first  Lk  bits  of  the  binary  expansion  of 
(W  +  xk)/2W  g  [0, 1]. 

■  The  final  estimator  at  the  FC  is 

(  K  \  -1  K 

e  =  (  22Lk\  22Lk W{2mk-\).  (9) 

U=i  /  k=  l 


To  form  the  estimator  in  (9),  each  sensor  only  needs  to 
know  its  own  noise  variance  to  determine  the  number  of 
bits  Lk.  The  final  fusion  (9)  is  completely  determined  by  the 
received  messages.  Thus,  such  an  estimation  scheme  is 
totally  distributed  and  easily  implemented  in  a  WSN.  As 
expected,  higher  quality  sensors  with  smaller  noise  variance 
send  more  bits  and  their  messages  carry  more  weight  at  the 
final  fusion  process.  Notice  that  0  in  (9)  is  unbiased,  i.e., 
E(0)  =  0,  with  MSE 


E{6  -  0)z  < 


25 


M'- 


which  is  optimal  (up  to  a  factor  of  3.125)  when  compared  to  the 
centralized  BLUE  estimator. 

An  example  from  [51]  comparing  on  the  basis  of  MSE  the 
universal  estimator  in  (9)  with  the  centralized  BLUE  estimator 
is  shown  in  Figure  6.  The  asymptotic  efficiency  is  defined  as 


asymptotic  efficiency  = - ^ - - . 

mse-eLi 


ESTIMATION  IN  INHOMOGENEOUS  ENVIRONMENTS 

In  an  inhomogeneous  sensing  environment,  different  sensors 
may  have  different  quality  of  observations  due  to  the  fact  that 
sensors  closer  to  the  target  may  have  a  higher  local  SNR  than 
those  farther  away.  While  characterizing  the  pdf  of  sensor  obser¬ 
vations  is  difficult  in  practice,  it  is  often  possible  for  each  sensor 
to  characterize  its  local  SNR.  This  can  be  accomplished  by  com¬ 
paring  the  received  signal  power  with  and  without  the  presence 
of  the  signal  of  interest  0. 


Clearly,  the  larger  the  asymptotic  efficiency,  the  more  efficient 
is  the  estimation  scheme.  In  all  the  simulation  runs,  we  take 
0  =  1,  W  =  9.  Sensor  noises  are  uniformly  distributed  with 
standard  deviations  shown  in  Figure  7.  The  distributions  of  the 
number  of  bits  transmitted  by  all  K  sensors  are  plotted  in 
Figure  8.  This  shows  that  the  universal  estimator  in  (9) 
requires  a  surprisingly  low  communication  overhead  (about 
3.8  b  per  sample  on  average)  and  achieves  essentially  the  same 
order  of  MSE  as  the  centralized  BLUE  estimator. 


IEEE  SIGNAL  PROCESSING  MAGAZINE  [33  JULY  2006 


1 

0.8 

0.6 

0.4 

0.2 

0 

30  60  120  250  500  1,000 

K 


-B-  Universal  Distributed  Estimator 
-0-  Centralized  BLUE 

x  Universal  Estimator  Upper  Bound 

[FIG6]  MSE  performance  of  the  universal  distributed  estimator. 


[FIG7]  Distribution  of  sensor  noise  standard  deviations. 


[FIG8]  Distribution  of  the  number  of  bits  transmitted  by 
local  sensors. 


ENERGY  MINIMIZING  ESTIMATION 

The  estimation  schemes  so  far  rely  on  the  idea  of  adapting  the 
bit  allocation  depending  on  the  observation  SNR.  For  the  pur¬ 
pose  of  energy  efficiency  (which  has  obviously  been  a  design 
criterion  for  almost  all  aspects  of  WSN  design;  see  [2],  [8], 
[25],  and  [36]),  a  sensor  should  choose  a  message  length  of 
Lk  =  0  if  the  quality  of  its  channel  to  the  FC  is  very  poor,  even 
if  the  quality  of  its  observation  is  high.  Thus,  to  maximize 
energy  savings,  it  is  necessary  to  adapt  the  message  length  L * 
not  only  based  on  the  local  observation  SNRs  but  also  based  on 
the  intended  channel  quality.  The  work  in  [54]  examined  an 
energy  minimizing  estimation  problem  by  modeling  the  wire¬ 
less  links  between  sensors  and  the  FC  as  additive  white 
Gaussian  noise  (AWGN)  channels  with  known  path-gains  g^. 
Sensors  adopt  uncoded  quadrature  amplitude  modulation 
(QAM)  for  the  quantized  bits.  Energy  models  for  uncoded  M- 
QAM  transmissions  are  available  in  [13],  [14],  and  [21].  If  sen¬ 
sor  k  sends  L *  bits  with  QAM  of  constellation  size  2lk  at  a  bit 
error  probability  then  the  total  amount  of  required  trans¬ 
mission  energy  is  given  by 


where  is  a  system  constant.  To  achieve  a  target  distortion 
A)  and  minimize  the  total  sensor  transmission  power,  the 
sensor  scheduling  problem  can  be  formulated  as  a  convex  pro¬ 
gram,  and  the  optimal  value  of  L *  can  be  derived  in  terms  of 
{o-f ,  9k)  as  [54] 


Lf  =  log  (l  +  - 1)+) .  (10) 

where  rj o  is  a  universal  constant  decided  jointly  by  the  target 
MSE,  sensor  noise  levels,  and  channel  gains. 

The  message  length  in  (10)  is  intuitively  appealing  as  it 
indicates  that  the  message  length  should  be  proportional  to 
the  logarithm  of  the  local  SNR  scaled  by  the  channel  path 
gain.  This  is  in  the  same  spirit  as  the  message  length  formula 
in  (8)  when  the  channels  are  ideal,  although  the  latter  was 
derived  from  a  different  perspective.  Also  notice  that  when 
rioQk  <  1,  we  have  L *  =  0,  and  therefore  P \  —  0.  Since  is 
the  channel  gain,  this  implies  that  when  the  channel  quality 
for  sensor  k  is  worse  than  the  threshold  r\ o,  we  should  discard 
its  observation  to  save  energy.  Such  a  strategy  of  discarding 
observations  for  the  purpose  of  energy  saving  has  been  pro¬ 
posed  in  the  context  of  censoring  sensors  [37]. 

To  obtain  the  desired  quantization  and  transmit  power  levels, 
we  have  assumed  in  this  article  that  the  fusion  center  knows 
{(or|, gk)  \  k—  1,2 , ,K).  This  assumption  is  reasonable  in 
cases  where  the  network  condition  and  the  signal  being  estimat¬ 
ed  change  slowly  in  a  quasi-static  manner.  Thus,  once  {(or|,  g^)} 
are  acquired  by  the  fusion  center,  they  can  be  used  for  a  reason¬ 
ably  long  period  of  time.  Also,  our  approach  can  be  generalized 
to  the  estimation  of  a  memoryless  discrete-time  random  process 
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6(f).  Due  to  the  temporal  memoryless  property  of  the  source 
and  sensor  observations,  we  can  impose  sample-by-sample  esti¬ 
mation  without  significant  estimation  performance  loss  but 
obtain  important  features  such  as  easy  implementation  and  no 
coding  and  estimation  delay. 

We  now  present  some  simulation  results  from  [54].  In 
the  simulations,  the  parameters  are  chosen  as  K  =  1,  000, 
or|  =  1  for  all  k ,  and  the  channel  path  loss  coefficients 
ak  =  g^1  =  with  dk  e  [1, 10]  and  a  e  [2,  6].  Figure  9  (a) 
illustrates  that  as  the  WSN  heterogeneity  increases,  a  large 
number  of  sensors  with  low  channel  gains  or  low-SNR 
observations  will  transmit  nothing  (i.e.,  Lk  =  0).  Figure  9(b) 
also  reveals  major  energy  savings  compared  to  uniform 
quantization  or  uniform  power  scheduling.  Besides  adaptive 
quantization,  other  interesting  strategies  such  as  protecting 
different  bits  with  different  bit  error  rates  have  been  dis¬ 
cussed  in  [26]. 


BAYESIAN  ESTIMATION  OF  RANDOM  SIGNALS 

When  knowledge  about  the  parameter  of  interest  is  available  in 
the  form  of  a  prior  distribution  p(0),  we  can  pose  our  distrib¬ 
uted  estimation  problem  in  a  Bayesian  framework.  Consider  the 
signal  model  in  (1)  and  define  the  messages  as 
mk  :=  l{xk  e  (zk,  oo)}.  Letting  m \:k  :=  [m\,  •  •  • ,  tuk]  denote 
the  message  sequence,  the  minimum  mean-square  error 
(MMSE)  estimator  can  be  found  as  the  conditional  mean  of  the 
posterior  distribution  0  =  E[0\m\:K\  with  p[0\m\:K\  = 
p(m\:K\0)p(0) / p(m\:K)  obtained  through  Bayes’  theorem. 

Since  computing  the  conditional  expectation  requires 
prohibitively  expensive  numerical  integrations,  we  consider  the 
maximum  a  posteriori  (MAP)  estimator  0map  = 
argmaxp[0|mi:/f]  that  requires  numerical  maximization 
instead  of  a  numerical  integration  [17].  Given  that  p(vci\:k)  is  a 
constant  and  the  logarithm  is  a  monotonically  increasing  func¬ 
tion,  0map  can  be  found  as 


JOINT  ESTIMATION  OF  A  VECTOR  SOURCE 

The  universal  distributed  estimation  can  be  extended  to  the  gen¬ 
eral  signal  model  (1)  with  vector  observations.  For  illustrative 
purposes,  we  considered  observation  model  xk  =  H k6  +w^, 
where  Y\k  is  a  matrix  with  dimension  (rk,  p).  We  assume  that 
noise  wk  has  zero  mean  and  covariance  matrix  Ck  but  otherwise 
unknown.  Noises  nk  are  spatially  uncorrelated  across  sensors. 
Without  loss  of  generality,  the  source  covariance  matrix  E(OOt) 
is  assumed  to  be  identity. 

It  is  possible  to  extend  the  universal  estimators  described 
earlier  to  this  vector  model  [29].  There  are  two  main  steps  in 
this  extension.  First,  at  each  sensor,  the  dimension  of  xk  can  be 
reduced  by  adopting  the  dimensionality  reduction  strategy 
proposed  in  [30].  It  turns  out  that  to  perform  the  centralized 
BLUE  estimator,  each  sensor  only  needs  to  send  to  the  FC  a 
number  real  messages  equal  to  rank  (HjC^H^).  After  reduc¬ 
ing  the  dimension  of  x*,,  a  universal  quantization  is  performed 
on  each  component,  with  the  number  of  bits  jointly  deter¬ 
mined  by  the  pair  of  local  matrices  (C*,  H*).  In  particular,  to 
ensure  a  factor  of  2  away  from  the  performance  of  the  central¬ 
ized  BLUE,  the  number  of  bits  that  must  be  sent  from  each 
sensor  to  the  FC  is  on  average  no  more  than 
Lk  =  (1/2)  log det(I  +  HjC^H*)  binary  bits. 

The  quantity  (1/2) logdet (I  +  HjCj^H*)  coincides  with 
Shannon’s  capacity  of  a  “virtual  AWGN  channel”  from 
nature  to  sensor  k  with  channel  matrix  given  by  Wk,  noise 
covariance  matrix  Ck,  and  input  power  given  by  identity 
matrix.  The  fact  that  Lk  represents  channel  capacity  shows 
nicely  that  the  message  length  is  decided  by  the  number  of 
“useful”  bits  contained  in  xk  =  Hk6  +wk.  Furthermore, 
this  message  length  function  is  reminiscent  of  that  in  (8) 
for  the  scalar  case. 

This  vector  source  estimation  problem  has  also  been  stud¬ 
ied  in  the  context  of  linear  decentralized  estimation  in  [31], 
[43],  [55],  and  [56]  for  the  purpose  of  dimensionality  reduc¬ 
tion  and  power  control  under  both  orthogonal  and 
nonorthogonal  multiple  access. 


<W  =  argmax  {log[p(mi:/f|6>)]  +  log[p(6>)]} ,  (11) 


where  log[p(mi:/f|0)]  coincides  with  the  log-likelihood  func¬ 
tion  in  (7).  If  the  noise  pdf  is  log-concave,  it  can  be  proved 
that  log  [p(mi:K \0)]  is  a  concave  function  of  0  [17],  a  result 
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[FIG9]  (a)  Number  of  active  sensors  decreases  as  the  channel 
path  losses  become  more  heterogeneous,  (b)  Power  savings 
compared  to  uniform  power  scheduling  or  uniform  quantization. 
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that  we  already  mentioned.  Furthermore,  if  the  prior  p(0)  is 
also  log-concave,  then  log [p(0)]  is  concave  by  definition 
and,  thus,  0map  can  be  found  as  the  maximum  of  a  concave 
function. 

Since  0map  — ^  0mle  for  K  sufficiently  large,  performance  of 
the  MAP  estimator  approaches  that  of  the  MLE  we  discussed 
earlier.  In  fact,  it  turns  out  that  the  penalty  paid  by  the  MAP  esti¬ 
mator  in  (11)  relative  to  the  clairvoyant  MAP  based  on  analog- 
amplitude  observations  is  smaller  than  the  penalty  paid  by  the 
respective  MLEs  [17]. 

DISTRIBUTED  KALMAN  FILTERING 

Consider  an  ad  hoc  WSN  deployed  to  estimate  the  state  of  a 
dynamic  stochastic  process.  Let  n  denote  the  time  index, 
x(n)  e  Mpxl  the  state  at  time  n ,  v(n,  k)  e  R  the  scalar  observa¬ 
tion  of  sensor  at  time  n,  and  consider  the  following  state- 
observation  model 

x(ri)  =  A (ri)x(n  -  1)  +  u  in) 
yin ,  k)  =  h  T(n,  k)x(n)  +  v(n ,  k),  (12) 


SIGN  OF  INNOVATIONS-KF 

We  wish  to  derive  a  distributed  KF  whereby  observations 
made  at  each  sensor  are  used  to  update  state  estimates  at  all 
sensors.  Our  goal  though  is  to  ensure  that  the  required 
exchange  of  information  among  sensors  entails  low-commu¬ 
nication  overhead.  To  this  end,  we  use  as  messages  m  (n)  the 
sign  of  the  innovation  (SOI): 

m{n )  :=  sign [yin)]  =  sign [y(n)  -  y(n\n  -  1)].  (14) 

Note  that  quantizing  yin)  to  the  SOI  min )  only  alters  the 
observation  model  and  consequently  the  prediction  step  for 
the  SOI-KF  coincides  with  the  prediction  step  for  the  clair¬ 
voyant  KF.  The  sign  nonlinearity,  though,  implies  that 
p[x(n)|mo;n-i]  is  non-Gaussian  and  computation  of  the  exact 
MMSE  estimate  requires  (computationally  expensive)  numer¬ 
ical  integrations  and  (memory  intensive)  propagation  of  the 
posterior  pdf.  However,  based  on  customary  simplifications 
made  in  nonlinear  filtering,  we  can  approximate  the  MMSE 
with  the  following  correction  recursions  [40]: 


where  the  matrix  A(n)  e  Wxp,  the  vector  h in,  k)  e  R^xl,  the 
driving  input  u  in)  is  normally  distributed  with  zero  mean  and 
variance  Cu(ri)  and  the  observation  noise  vin,  k)  is  zero-mean 
AWGN  and  independent  across  sensors  with  variance  a2  in,  k). 
Supposing  that  A(n),  C u{ri),  h(rz,  k),  and  <7% in,  k)  are  available 
for  all  n,  k ,  the  goal  of  the  WSN  is  for  each  sensor  to  form  an 
estimate  ofx(rz). 

Without  loss  of  generality,  we  assume  that  sensors  broadcast 
their  data  in  a  time  division  multiple  access  (TDMA)  fashion  with 
kin)  indexing  the  sensor  scheduled  at  the  n  th  time  slot;  for  sim¬ 
plicity  we  denote  =  Sin).  If  we  had  infinite  bandwidth  avail¬ 
able,  the  sensor  Sin)  scheduled  for  transmission  would 
communicate  its  observation  yin,  kin))  =  yin)  to  all  other  sen¬ 
sors.  Having  the  entire  set  of  observations  yo^  := 
[z/(0), . . . ,  yin)]T  available,  each  sensor  would  then  be  able  to 
obtain  the  MMSE  estimate  x(n\ri)  :=  E[x(ri)\yo:n]  and  its 
corresponding  error  covariance  matrix  lA{n\ri)  := 
E[ixin\n)  -  x(rz))(x(n|w)  -  x(n))7]  by  means  of  Kalman  filtering 
(KF)  iterations,  each  of  which  includes  a  prediction  step  and  a  cor¬ 
rection  step  [24,  Chap.  13]. 

Supposing  that  x(rz  -  \\n  —  1)  and  M(n  -  \\n  —  1)  are  avail¬ 
able  at  time  n,  it  follows  from  the  linear  model  in  (12)  that  the 
predicted  estimate  x(n\n  -  1)  and  its  corresponding  covariance 
matrix  M(n|n  -  1)  are  given  by 

x(n\n  -  1)  =  A(rz)x(rz  -  l\n  -  1) 

M(n|n  —  1)  =  A(n)M(n  —  1| n  —  l)Ar(n)  +  C uin).  (13) 

Following  this  prediction  step  we  use  the  innovation 
sequence  yin)  :=  [yin)  —  hTin)xin\n  —  1)]  to  obtain  the  cor¬ 
rected  estimate  x(n\n)  using  the  well-known  KF  correction; 
see  e.g.,  [24  Sec.  13.6].  The  innovation  yin)  represents  the 
information  about  the  state  contained  in  the  current  obser¬ 
vation  that  cannot  be  predicted  from  past  observations. 


x{n\n)  =  xin\n  —  1)  +  min) 


(V27^)M(n|n  -  l)h(n) 


M(n|n)  =  M(rz|rz — 1)  — 


(2/7r)M(n|n-l)h(n)h(n)rM(n|n-l) 


h(n)rM(n|n-l)h(n)  + 


(15) 


Even  at  a  minimal  communication  cost,  the  SOI-KF  is 
strikingly  similar  to  the  clairvoyant  KF.  The  covariance 
updates  in  particular  are  identical  except  for  the  2/n  factor 
in  (15).  We  emphasize  that  (15)  is  not  the  result  of  propos¬ 
ing  a  KF-like  recursion  based  on  a  priori  heuristics.  On  the 
contrary,  the  SOI-KF  implements  MMSE  estimation  based 
on  the  SOI  in  (14)  whose  form  ends  up  being,  a  posteriori, 
reminiscent  of  the  KF. 

While  the  MSE  corrections  of  the  KF  and  SOI-KF  are  similar, 
the  estimate  updates  for  x(n\n)  appear  to  be  quite  different. 
However,  it  is  possible  to  express  the  SOI-KF  corrector  in  (15)  in 
a  form  that  exemplifies  its  link  with  the  KF  corrector.  Indeed,  if 
we  define  the  SOI-KF  innovation  sequence  as 
min\n  —  1)  :=  y/i2/7z)E[y2in\n  —  \)]min),  it  is  not  difficult  to 
show  that  the  SOI-EKF  correction  can  be  written  as 


x{n\n)  =  xin\n  —  1) 

M(w|w-l)h(w) 

H — - o —  min\n  -  1),  (16) 

hr(n)M(n|n  -  l)h(n)  +  olin) 

which  is  identical  to  the  KF  update  if  we  replace  min\n  —  1) 
with  the  innovation  yin\n  —  1)  =  yin)  —  yin\n  —  1) . 
Moreover,  note  that  the  units  of  min\n  —  1)  and  yin\n  —  1)  are 
the  same,  and  that  E[min\n  -  1)]  =  E[yin\n  —  1)]  =  0.  Even 
more  interesting,  by  definition  it  holds  that  E[m2in\n  —  1)]  = 
i2 / n)E[y2 in\n  —  1)],  which  explains  the  relationship  between 
the  covariance  corrections  for  the  KF  and  for  the  SOI-KF  in 
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(15).  The  difference  between  the  SOI-KF  and  KF  corrections  is 
that  in  the  SOI-KF  the  magnitude  of  the  correction  at  each 
step  is  determined  by  the  magnitude  of  E[m2(n\n  —  1)],  and  it 
is  the  same  regardless  of  how  large  or  small  the  actual  innova¬ 
tion  m(n\n  -  1)  is. 

SOI-KF  IMPLEMENTATION  AND  MSE  PERFORMANCE 

Implementation  of  the  SOI-KF  requires  running  two  separate 
algorithms.  The  observation-transmission  algorithm  is  run  by 
the  sensors  as  dictated  by  the  scheduling  algorithm  and  starts 
by  collecting  the  observation  y(n,  k)  y(n).  The  sensor  then 
computes  the  state  and  observation  predictions  x(n\n  -  1) 
and  y{n\n  -  1)  =  hT(ri)x(n\n  -  1) .  Based  on  y{n\n  -  1)  = 
hT(ri)x(n\n  —  1),  it  obtains  the  SOI  as  in  (14)  and  broadcasts  it 
to  all  other  sensors  as  the  message  m{n).  The  reception- 
estimation  algorithm  is  continuously  run  by  all  sensors  to  track 
x(n)  and  is  identical  to  a  KF  algorithm  except  for  the  (minor) 
differences  in  the  update  equations.  At  each  time  slot,  the  state 
prediction  is  computed  using  (13)  and  after  receiving  the  SOI 
m  (ri)  the  corrected  estimate  is  obtained  using  (15). 

MSE  performance  of  the  SOI-KF  can  be  related  with  that 
of  the  KF  by  defining  an  equivalent  system  that  is  identical  to 
the  model  in  (12)  except  that  the  observation  noise  power 
at  time  n  is  (jt  /  2)  <j2(n).  It  turns  out  that  the  steady-state  MSE 
of  the  clairvoyant  KF  run  on  this  equivalent  system  basically 
coincides  with  the  steady-state  MSE  of  a  SOI-KF  run  on  the 
original  system  [40].  In  other  words,  the  MSE  increase  when 
using  the  SOI-KF  is  as  much  as  the  KF  would  incur  when 
applied  to  a  model  with  n /2  higher  observation  noise  variance. 

While  we  presented  SOI-KF  for  scalar  observations,  general¬ 
izations  are  available  to  vector  observations  and  colored  noise 
after  prewhitening  [40]. 


TARGET  TRACKING  WITH  SOI-EKF 

Target  tracking  based  on  distance-only  measurements  is  a  typi¬ 
cal  problem  in  bandwidth-constrained  distributed  estimation 
with  WSNs  (see  e.g.,  [15]  and  [16])  for  which  an  extended  SOI- 
KF  to  nonlinear  models  appears  to  be  particularly  attractive. 
Consider  K  sensors  randomly  and  uniformly  deployed  in  a 
square  region  of  2 L  x  2 L  meters  and  suppose  that  sensor  posi¬ 
tions  {xk}Kk=l  are  known. 

The  WSN  is  deployed  to  track  the  position 
x(n)  :=  [x\ (ri),  X2 (ri)]r  of  a  target,  whose  state  model  accounts 
for  x(n)  and  the  velocity  v(ri)  :=  [v\ (ri),  V2 (n)]T  but  not  for  the 
acceleration  that  is  modeled  as  a  random  quantity.  Under  these 
assumptions,  we  obtain  the  state  equation  [22] 

x(n)  =  x(n  -  1)  +  Tsv(n  -  1)  +  (7f  /2)u(ri) 

v(w)  =  v(n  -  1)  +  Tsu(n),  (17) 

where  Ts  is  the  sampling  period  and  the  random  vector 
u (ri)  e  M2  is  zero-mean  white  Gaussian;  i.e.,  p(u(n))  = 
J\f(u(ri);  0;  a2 1).  The  sensors  gather  information  about  their 
distance  to  the  target  by  measuring  the  received  power  of  a 
pilot  signal  following  the  path-loss  model  y^(ri)  = 
a  log  ||x(n)  -  x*||  +  v(ri)  with  a  >  2  a  constant,  ||x(n)  -  x*|| 
denoting  the  distance  between  the  target  and  S^,  and  v(ri)  the 
observation  noise  with  pdf  p(v{n))  =  J\f(y(n)\  0;  cr2). 

Mimicking  an  extended  (E)KF  approach,  we  linearize  this 
observation  model  in  a  neighborhood  of  x(n\n  -  1)  to  obtain  an 
approximate  observation  model  that  along  with  the  state  evolu¬ 
tion  in  (17)  is  of  the  form  (12).  We  can  now  use  the  SOI-KF  to 
track  the  target’s  position  x(n),  which  offers  a  version  of  EKF 
with  low  communication  cost.  The  results  of  simulating  this 
tracker  (that  we  abbreviate  as  SOI-EKF)  are  depicted  in  Figure 


(a)  (b) 

[FIG10]  Target  tracking  with  EKF  and  SOI-EKF. 
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10,  where  we  see  that  the  SOI-EKF  succeeds  in  tracking  the  tar¬ 
get  with  position  errors  less  than  10  m.  While  this  accuracy  is 
just  a  result  of  the  specific  experiment,  the  important  point  here 
is  that  the  clairvoyant  EKF  and  the  SOI-EKF  yield  almost  identi¬ 
cal  performance  even  when  the  former  relies  on  analog-ampli¬ 
tude  observations  and  the  SOI-EKF  on  the  transmission  of  a 
single  bit  per  sensor. 

INFORMATION  THEORETIC  PERSPECTIVES 

In  this  section,  we  study  the  WSN  illustrated  in  Figure  1  by 
approaching  the  source  acquisition,  data  communication,  and 
final  fusion  processes  from  an  information  theoretic  point  of  view. 

As  depicted  in  Figure  11,  the  source  parameter  of  interest  is 
modeled  by  a  discrete-time  memoryless  random  process 
{0(0  :  1  <  t  <  oo}.  Sensor  observations  are  denoted  by 
{Xfr{t)  :  k  =  1,2,...,  K],  and  their  joint  conditional  distribu¬ 
tion  (given  the  source  0(f))  is  known.  A  general  coding  scheme 
with  block  length  n  can  be  described  as  follows.  First,  sensor 
observations  X%  :=  [X^t)  :  1  <  /  <  n)  are  encoded  in  a  distrib¬ 
uted  fashion  (4  denotes  the  encoder  of  the  k\h  sensor).  Then  a 
single  decoder  g  decodes  0n  :=  [0(f)  :  1  <  t  <  n)  based  on  the 
received  information  from  the  distributed  encoders.  In  what  fol¬ 
lows,  we  will  discuss  two  cases  where  the  encoders  are  designed 
under  either  rate  or  cost  constraints. 


[FIG11]  A  coding  scheme  in  a  WSN  with  an  FC. 

SOURCE  CODING  UNDER  RATE  CONSTRAINTS 

When  rate  constraints  are  imposed  on  encoded  messages,  we 
obtain  a  source-coding  problem  in  which  the  goal  is  to  charac¬ 
terize  the  rate-distortion  region  1Z(D).  The  latter  consists  of  all 
rate-tuples  R  =  (R\,  R2, ... ,  Rk )  that  allow  for  the  reconstruc¬ 
tion  of  the  source  6  within  certain  distortion  level  D,  when  the 
sensor  observations  are  encoded  at  a  rate  not  exceeding  R *  per 
sensor  k  e  [1,  K].  This  is  the  so-called  CEO  problem  that  was 
first  introduced  in  [5]  and  subsequently  studied  in  [9],  [33],  and 
[46].  A  natural  source  coding  scheme  can  be  described  as  fol¬ 


lows.  Each  sensor  encoder  first  quantizes  its  observation,  these 
quantized  processes  are  then  losslessly  transmitted  to  the 
decoder  using  the  random  binning  scheme  [11].  The  decoder 
uses  these  quantized  processes  to  form  its  reproductions.  This 
leads  to  the  Berger-Tung  inner  region  [4],  [45].  Except  for  inner 
and  outer  bounds  on  1Z(D )  derived  in  [4]  and  [45],  and  the 
quadratic  Gaussian  case  addressed  in  [33],  the  CEO  problem 
remains  open  to  this  date. 

The  1Z(D)  region  serves  as  a  performance  benchmark  for 
distributed  estimation  under  bandwidth  constraints.  In  the 
Gaussian  quadratic  CEO  problem,  [33]  derived  an  asymptotic 
total  rate  distortion  function  of  the  form  D  ^  a2  /  (2  R^)  when 
both  K  and  R s  are  large,  where  R^  is  the  total  rate  and  o2  is 
the  sensor  noise  variance.  For  the  special  case  of  1  b  per  sensor 
sample,  the  total  communication  rate  R £  =  K,  and  thus  the 
best  achievable  MSE  performance  dictated  by  rate  distortion 
theory  is  no  less  than  a2 / (2 K).  Recall  that  the  distributed  esti¬ 
mators  achieve  an  asymptotic  MSE  of  (ttg2)/(2K)  when  the 
threshold  of  local  message  can  be  taken  close  to  6.  This  MSE  is 
a  factor  n  away  from  the  performance  limit  predicted  by  the 
rate-distortion  function.  Also,  the  universal  estimators  in 
“Universal  Approaches”  exhibit  MSE  of  W2 /K  that  has  the  cor¬ 
rect  asymptotic  behavior  with  respect  to  the  network  size  K. 
This  implies  that  the  simple  distributed  estimators  all  have  the 
optimal  scaling  behavior  in  terms  of  network  size  K.  In  contrast, 
the  information  theoretic  schemes  suggested  by  [4],  [33],  and 
[45]  require  complete  knowledge  of  source/observation  distribu¬ 
tion,  ions  as  well  as  long  block  lengths  to  achieve  the  optimal 
MSE  performance  predicted  by  rate  distortion. 

In  the  inhomogeneous  case  where  local  sensor  SNRs  are  not 
identical,  specifying  the  optimal  rate  allocation  minimizing  the 
sum  rate  R £  in  the  rate  distortion  region  is  also  an  interesting 
problem.  It  is  well-known  that  the  optimal  rate  allocation  point 
that  attains  the  sum  rate  distortion  function  is  not  unique  and  is 
actually  a  polymatroid  with  L\  vertices  [9].  The  optimal  rate  allo¬ 
cation  region  can  be  found  through  optimal  Gaussian  test  chan¬ 
nels  and  is  given  in  [9]  for  the  case  of  scalar  source  and 
observations.  Vector  sources  and  vector  observations  have  been 
studied  in  [30]  and  [41].  The  optimal  rate  allocation  strategy  is 
equivalent  to  searching  for  optimal  covariance  matrices  in 
AWGN  test  channels  and  can  be  interpreted  as  a  distributed 
Karhunen-Loeve  transform  [19]  problem.  The  rate  allocation 
problem  with  distributed  Karhunen-Loeve  transform  is  noncon- 
vex,  but  suboptimal  coordinate  descent  algorithms  for  optimiz¬ 
ing  individual  sensor  local  covariance  matrix  have  been 
proposed  in  [19]  and  [42].  Recently,  [52]  reformulated  the  origi¬ 
nal  problem  in  an  equivalent  convex  form  that  is  efficiently  solv¬ 
able  by  interior  point  methods  [7].  A  lower  bound  of  the  sum 
rate  distortion  function  of  the  Gaussian  multiterminal  source 
coding  has  also  been  proposed  in  [50]  by  considering  the  joint 
compression  of  correlated  Gaussian  sources  under  individual 
distortion  criteria. 

Figure  12  plots  three  rate  distortion  curves:  i)  a  lower 
bound  of  the  sum  rate  distortion  function  assuming  full 
cooperation  among  the  encoders;  ii)  an  upper  bound  of  the 
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sum  rate  distortion  function  using  the  so-called  EC  (estima¬ 
tion-first  compression-second)  scheme  introduced  in  [42]; 
and  iii)  the  optimal  sum  rate  distortion  function  that  is  cal¬ 
culated  by  solving  the  convex  problem  formulated  in  [52] 
with  the  numerical  MAXDET  routine  [47].  The  detail  of  the 
simulation  setup  is  referred  to  [52]. 

ON  THE  OPTIMAL  COST-DISTORTION  TRADEOFF 

When  channels  from  sensors  to  the  FC  are  noisy,  we  need  to 
introduce  cost  constraints  on  the  transmitted  symbols  from 
each  individual  sensor.  Such  constraints  may  include  power 
constraints  as  a  special  case.  In  this  way,  it  is  possible  to  cast  dis¬ 
tributed  estimation  as  a  source-channel  communication  prob¬ 
lem.  The  fundamental  objective  of  the  latter  is  to  determine  the 
optimal  tradeoff  between  cost  and  distortion  in  an  information 
theoretic  sense  regardless  of  complexity  and  delay. 

In  several  important  cases,  source  coding  and  channel  cod¬ 
ing  can  be  separated  without  performance  loss.  For  example, 
in  a  point-to-point  link,  source  and  channel  coding  can  be 
performed  separately  without  performance  degradation  if  the 
source  and  channel  are  both  discrete  and  memoryless  [44, 
Theorem  21].  This  source-channel  separation  theorem  is 
quite  appealing  from  a  practical  standpoint  since  it  implies 
that  source  coding  can  be  performed  without  channel  knowl¬ 
edge  and  similarly  for  channel  encoding.  Unfortunately,  the 
separation  theorem  does  not  extend  to  general  links  [11, 
Chapter  14].  An  interesting  counterexample  can  be  found  in 
[12]  for  lossless  transmission  of  correlated  sources  through 
an  interfering  (nonorthogonal)  multiple  access  channel.  In 
this  case,  separating  source  from  channel  coding  is  subopti- 
mal  (see  also  [20]). 

However,  for  the  sensor  network  in  Figure  1,  if  the  intersen¬ 
sor  interference  is  resolved  by  reservation-based  orthogonal  pro¬ 
tocols  (e.g.,  TDMA  or  FDMA)  and  local  sensors  have 
noninterfering  channels  to  the  FC,  it  turns  out  that  the  optimal 
tradeoff  between  cost  and  distortion  can  be  achieved  by  separate 
source  and  channel  coding  [49].  Proving  the  separation  theorem 
in  this  case  entails  a  multiple-letter  characterization  of  the  rate- 
distortion  region.  By  combining  this  multiletter  representation 
of  1Z(D)  with  a  continuity  property  under  orthogonal  multiac¬ 
cess,  a  cost  distortion  pair  (r,  D )  is  achievable  if  and  only  if 

C(F)nK(D)  ^0, 

where  C(r)  is  the  multiaccess  channel  capacity.  This  work 
extends  the  results  of  [3]  and  [23]  for  the  lossless  transmission 
of  correlated  sources  from  finite  alphabets  through  an  orthogo¬ 
nal  multiple  access  channel  to  the  rate  distortion  case  for  the 
WSN  in  Figure  1. 

EXAMPLE  (GAUSSIAN  SENSOR  NETWORKS) 

For  the  special  case  of  estimating  a  Gaussian  source  using 
MSE  as  distortion  measure,  we  can  compare  the  power  dis¬ 
tortion  region  achieved  by  the  separation  principle  with 
those  achieved  by  joint  source-channel  coding  strategies. 


In  particular,  it  is  shown  in  [8]  that  when  each  sensor  has  a 
fixed  power  budget  and  sensors  are  accessing  the  channel 
synchronously,  the  distortion  achieved  by  optimal  separate 
source  and  channel  coding  decreases  at  a  rate  1/log K, 
while  for  a  simple  “analog”  uncoded  transmissions,  the 
MSE  decreases  like  1/K,  which  is  much  faster.  However, 
since  the  separation  theorem  holds  with  orthogonal  access, 
the  optimal  tradeoff  between  total  sensor  power  and  overall 


[FIG12]  The  optimal  sum  rate  distortion  function  compared  to  an 
EC  upper  bound  and  a  full-encoder-cooperation  lower  bound. 


[FIG13]  Comparison  of  power-distortion  region  achieved  by  the 
separate  source  and  channel  coding  and  the  uncoded 
transmission  (denoted  by  V(D)  and  Va(D))  respectively)  when 
the  total  number  of  sensors  K  =  2.  The  regions  with  dashed 
boundaries  correspond  to  one  specific  pair  of  Gaussian  test 
channels  with  noise  variances  8 \  =  0.53,  8\  =  0.38.  Exhausting 
all  feasible  pairs  of  ( 8 8\)  gives  the  complete  regions. 
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distortion  is  achieved  by 
separate  source  and  chan¬ 
nel  coding.  As  a  result,  the 
“digital”  strategy  outper¬ 
forms  the  “analog”  uncod¬ 
ed  transmission  strategy. 

This  result  should  be  contrasted  to  the  case  of  nonorthogo- 
nal  multiple  access  for  which  the  “analog”  uncoded  trans¬ 
mission  strategy  is  known  to  significantly  outperform  the 
digital  approach  of  separate  source  and  channel  coding.  An 
example  of  the  achieved  power  distortion  regions  for  the 
separation  principle  and  uncoded  analog  transmission  is 
depicted  in  Figure  13. 

CLOSING  REMARKS 

This  article  provided  an  overview  of  distributed  estimation- 
compression  problems  encountered  with  WSNs.  A  general  for¬ 
mulation  of  distributed  compression-estimation  under  rate 
constraints  was  introduced,  pertinent  signal  processing  algo¬ 
rithms  were  developed,  and  emerging  tradeoffs  were  delineat¬ 
ed  from  an  information  theoretic  perspective.  Specifically,  we 
designed  rate-constrained  distributed  estimators  for  various 
signal  models  with  variable  knowledge  of  the  underlying  data 
distributions.  We  proved  theoretically,  and  corroborated  with 
examples,  that  when  the  noise  distributions  are  either  com¬ 
pletely  known,  partially  known  or  completely  unknown,  dis¬ 
tributed  estimation  is  possible  with  minimal  bandwidth 
requirements  which  can  achieve  the  same  order  of  MSE  per¬ 
formance  as  the  corresponding  centralized  clairvoyant  esti¬ 
mators.  A  distributed  state  estimation  problem  in  the  context 
of  WSN  has  also  been  considered  when  there  is  prior  infor¬ 
mation  about  the  parameter  of  interest  using  the  sign  of 
innovations.  For  WSNs  operating  in  inhomogeneous  environ¬ 
ments,  we  presented  resource  allocation  and  sensor  schedul¬ 
ing  algorithms  that  can  result  in  considerable  cost  savings 
and  MSE  improvement. 

We  have  not  considered  the  interaction  of  routing  with 
our  distributed  compression-estimation  framework.  This 
and  further  cross-layer  optimized  protocols  accounting  for 
all  layers  in  the  stack  is  worth  further  investigation  and  is 
expected  to  improve  the  overall  design  of  distributed  estima¬ 
tors  using  WSNs. 
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